perm filename PATREC[4,KMC]4 blob
sn#086952 filedate 1974-02-12 generic text, type T, neo UTF8
00100 AN ALGORITHM WHICH RECOGNIZES NATURAL LANGUAGE
00200 DIALOGUE EXPRESSIONS
00300
00400
00500
00600 COLBY AND PARKISON
00700
00800 OUTLINE
00900 INTRODUCTORY -Discussion of language as code, other approaches
01000 sentence versus word dictionary using projection
01100 rules to yield an interpretation from word definitions.
01200 experience with old Parry.
01300 PROBLEMS -dialogue problems and methods. Constraints. Special cases.
01400 Preprocessing- dict words only
01500 translations
01600
01700 (ROGER- WHAT HAPPENED TO FILLERS?)
01800 contractions
01900 expansions
02000 synonyms
02100 negation
02200 Segmenting - prepositions, wh-words, meta-verbs
02300 give list
02400 Matching - simple and compound patterns
02500 association with semantic functions
02600 first coarsening - drop fillers- give list
02700 second coarsening - drop one word at a time
02800 dangers of insertion and restoration
02900 Recycle condition- sometimes a pattern containing pronouns
03000 is matched, like "DO YOU AVOID THEM". If THEM could be
03100 a number of different things and Parry's answer depends on
03200 which one it is, then the current value of the anaphora,
03300 THEM, is substituted for THEM and the resulting pattern
03400 is looked up. Hopefully, this will produce a match to a
03500 more specific pattern, like "DO YOU AVOID MAFIA".
03600 default condition - pass surface to memory
03700 change topic or level
03800 Advantages - real-time performance, pragmatic adequacy and
03900 effectiveness, performance measures.
04000 "learning" by adding patterns
04100 PARRY1 ignored word order- penalty too great
04200 PARRY1 too sequential taking first pattern it found
04300 rather than looking at whole input and then deciding.
04400 PARRY1 had patterns strung out throughout procedures
04500 and thus cumbersome for programmer to see what patterns were.
04600 Limitations - typical failures, possible remedies
04700 NO NOUNS, ETC.- NO MORE GENERAL RULES LIKE NOUN PHRASE,
04800 AMBIGUITIES SLIDE THROUGH WHEREAS PARSER OWULD CATCH THEM
04900 Summary
05000
05100 INTRODUCTION
05200
05300 To recognize is to identify something as an instance of the
05400 "same again". This familiarity is possible because of recurrent
05500 characteristics of the world which repeat themsleves over and over
05600 again. We shall describe an algorithm which recognizes recurrent
05700 characteristics of natural language dialogue expressions through a
05800 multi-stage sequence of functions which progressively transforms
05900 these input expressions into a pattern which eventually best matches
06000 a more abstract stored pattern. The name of the stored pattern has a
06100 pointer to the name of a response function which decides what to do
06200 once the input has been characterized. Here we shall discuss
06300 only the recognizing functions, except for one response function
06400 (anaphoric substitution) which interactively aids the
06500 characterization process. How the response functions operate will be
06600 described in a future communication (Faught, Colby, Parkison).
06700 In constructing and testing a simulation of paranoid
06800 processes, we were faced with the problem of reproducing paranoid
06900 linguistic behavior in a diagnostic psychiatric interview. The
07000 diagnosis of paranoid states, reactions or modes is made by
07100 clinicians who judge a degree of correspondence between what they
07200 observe linguistically in an interview and their conceptual model of
07300 paranoid behavior. There exists a high degree of agreement about this
07400 conceptual model which relies mainly on what an interviewee says and
07500 how he says it.
07600 Natural language is a life-expressing code people use for
07700 communication with themselves and others. In a real-life dialogue
07800 such as a psychiatric interview, the participants have interests,
07900 intentions, and expectations which are revealed in their linguistic
08000 expressions. To produce effects on an interviewer which he would
08100 judge similar to the effects produced by a paranoid patient , an
08200 interactive simulation of a paranoid patient must be able to
08300 demonstrate typical paranoid interview behavior. To achieve the
08400 desired effects, the paranoid model must have the ability to deal
08500 with the linguistic behavior of the interviewer.
08600 There are a number of approaches one might consider for an
08700 ideal handling of dialogue expressions. One approach would be
08800 to construct a dictionary of all expressions which could possibly
08900 arise in an interview. Associated with each expression would be its
09000 interpretation, depending on dialogue context, which in turn must
09100 somehow be defined. For obvious reasons, no one takes this approach
09200 seriously. Instead of an expression dictionary, one might
09300 construct a word dictionary and then use projection rules to yield an
09400 interpretation of a sentence from the dictionary definitions. This,
09500 has been the approach adopted by conventional linguistic parsers.
09600 Such a method performs adequately as long as the dictionary involves
09700 only a few hundred words, each word having only one or two senses,
09800 and the dialogue is limited to a mini-world of a few objects and
09900 relations. But the problems which arise in a psychiatric interview
10000 conducted in unrestricted English are too great for this method to be
10100 useful for real-time dialogues in which immediacy of response is an
10200 important requirement of efficiency.
10300 There is little consensus knowledge about how humans process
10400 natural language. They can be shown to possess some knowledge of
10500 grammar rules but this does not entail that they use a grammar in
10600 interpreting and producing language. Irregular verb-tenses and
10700 noun-plurals do not follow rules; yet people use thousands of them.
10800 One school of linguistics believes that people possess full
10900 transformational grammars for processing language. In our view this
11000 position seems dubious. Language is what is recognized but the
11100 processes invloved may not be at all linguistic. Originally
11200 transformationsl grammars were not designed to "understand" a large
11300 subset of English; they represented a set of axioms for deciding
11400 whether a string is "grammatical". Efforts to use them for other
11500 purposes have not been fruitful.
11600 An analysis of what one's problem actually is should guide
11700 the selection or invention of methods appropriate to that problem's
11800 solution. Our problem was not to develop a consistent and general
11900 theory of language nor to assert empirically testable hypotheses
12000 about how people process language. Our problem was to write an
12100 algorithm which recognizes what is being said in a dialogue and what
12200 is being said about it in order to make a response such that a sample
12300 of I-O pairs from the paranoid model is judged similar to a sample of
12400 I-O pairs from paranoid patients. From the perspective of
12500 artificial intelligence as an attempt to get computers to perform
12600 human intellectual tasks, methods had to be devised for the task of
12700 participating in a human dialogue in a paranoid-patient-like way. We
12800 are not making an existence claim that our strategy represents the
12900 way people process language. We sought efficacious methods which
13000 could operate efficiently in real time. We would not deny that our
13100 methods for this task are possibly analogous to the methods humans
13200 use. And since our methods provide a general way of mapping from many
13300 "surface" expressions to a single stored pattern, these methods are
13400 useful for any type of dialogue algorithm.
13500 To perform the task of managing communicative uses and
13600 effects of natural language, we adopted a strategy of transforming
13700 the input until a pattern is achieved which matches completely or
13800 partially a more abstract stored pattern. This strategy has proved
13900 adequate for our purposes a satisfactory percentage of the time. (No
14000 one expects an algorithm to be successful 100% of the time since not
14100 even humans, the best natural language systems around, achieve this
14200 level of performance). The power of this method for natural language
14300 dialogues lies in its ability to ignore unrecognizable expressions
14400 and irrelevant details. A conventional parser doing word-by-word
14500 analysis fails when it cannot find one or more of the input words in
14600 its dictionary. It is too fragile for a dialogue in that it must know
14700 every word; it cannot guess.
14800 In early versions of the paranoid model, (PARRY1), many (but
14900 not all) of the pattern recognition mechanisms were weak because they
15000 allowed the elements of the pattern to be order independent. For
15100 example, consider the following expressions:
15200 (1) WHERE DO YOU WORK?
15300 (2) WHAT SORT OF WORK DO YOU DO?
15400 (3) WHAT IS YOUR OCCUPATION?
15500 (4) WHAT DO YOU DO FOR A LIVING?
15600 (5) WHERE ARE YOU EMPLOYED?
15700 In PARRY1 a procedure would scan these expressions looking for an
15800 information-bearing contentive such as "work", "for a living", etc.
15900 If it found such a contentive along with a "you" or "your" in the
16000 expression, regardless of word order, it would respond to the
16100 expression as if it were a question about the nature of one's work.
16200 (There is some doubt this even qualifies as a pattern since
16300 interrelations between words are ignored and only their presence is
16400 considered). An insensitivity to word order has the advantage that
16500 lexical items representing different parts of speech can represent
16600 the same concept,e.g. "work" as noun or as verb. But a price is paid
16700 for this resilience and elasticity. We found from experience that,
16800 since English relies heavily on word order to convey the meaning of
16900 it messages, the penalty of misunderstanding (to be distinguished
17000 from ununderdstanding), was too great. Hence in PARRY2 , as will be
17100 described shortly, all the patterns require a specified word order.
17200 It seems agreed that for high-complexity problems it is
17300 useful to have constraints. Diagnostic psychiatric interviews
17400 (and especially those conducted over teletypes) have several natural
17500 constraints. First, clinicians are trained to ask certain questions
17600 in certain ways. These stereotypes can be treated as special cases.
17700 Second, only a few hundred standard topics are brought up by
17800 interviewers who are trained to use everyday expressions and
17900 especially those used by the patient himself. When the interview is
18000 conducted over teletypes, expressions tend to be shortened since the
18100 interviewer tries to increase the information transmission rate over
18200 the slow channel of a teletype. (It is said that short expressions
18300 are more grammatical but think about the phrase "Now now, there
18400 there.") Finally, teletyped interviews represent written speech.
18500 Speech is known to be 50% redundant; hence many unrecognized words
18600 can be ignored without losing the meaning of the message. Written
18700 speech is loaded with idioms, cliches, pat phrases, etc. - all
18800 being easy prey for a pattern-recognition approach. It is futile to
18900 try to decode an idiom by analyzing the meanings of its individual
19000 words. One knows what an idiom means or one does not.
19100 We shall now describe the pattern recognition functions of the
19200 algorithm.
19300
19400 PREPROCESSING
19500
19600 Each word in the input expression is first looked up in a
19700 dictionary of (1300) words which remains in core in machine language.
19800 The dictionary consists of a list of words and names of word-classes
19900 they can be translated into. The size of the dictionary is determined
20000 by the patterns, i.e. each dictionary word appears in one or more
20100 patterns. (ROGER- SHOW PIECE OF DICT?) Words in the dictionary
20200 reflect PARRY2's main interests. If a word in the input is not in the
20300 dictionary, it is dropped from the pattern being formed. Thus if the
20400 input were:
20500 WHAT IS YOUR CURRENT OCCUPATION?
20600 and the word "current" is not in the dictionary, the pattern at this
20700 stage becomes:
20800 (WHAT IS YOUR OCCUPATION)
20900 The question-mark is thrown away as redundant since questions are
21000 recognized by word order. (A statement followed by a question mark
21100 (YOU GAMBLE?) is considered to be communicatively equivalent in its
21200 effects to that statement followed by a period.) Synonymic
21300 translations of words are made so that the pattern becomes, for
21400 example:
21500 (WHAT BE YOU JOB)
21600 Groups of words are translated into a single word-class name, so
21700 that, for example, "for a living" becomes "job".
21800 Misspellings are also handled in the dictionary by
21900 simply rewriting a recognized misspelling into its correct form.Thus
22000 "yyou" becomes "you". The common misspellings were gathered from over
22100 4000 interviews with versions of the paranoid model. Other
22200 misspellings do not appear in the pattern because they are not
22300 represented in the dictionary.
22400 Certain juxtaposed words are contracted into a single
22500 word,e.g. "GET ALONG WITH" becomes "GETALONGWITH". This is done (1)
22600 to deal with groups of words which are represented as a single
22700 element in the stored pattern, and (2) to prevent segmentation from
22800 occurring at the wrong places, such as at a preposition inside an
22900 idiom. Besides these contractions, certain expansions are made so
23000 that for example, "DON'T" becomes "DO NOT" and "I'D" becomes "I
23100 WOULD".
23200 SEGMENTING
23300
23400 Another weakness in the crude pattern matching of PARRY1 was
23500 that it took the entire input expression as its basic processing
23600 unit. Hence if only two words were recognized in an eight word input,
23700 the risk of misunderstanding was great. We needed a way of dealing
23800 with units shorter than the entire input expression.
23900 Expert telegraphists stay six to twelve words behind a
24000 received message before transcribing it. (Bryan and Harter, 1897).
24100 Translators wait until they have heard 4-6 words before they begin
24200 translating. Aided by a heuristic from machine-translation work by
24300 Wilks ( ), we devised a way of bracketing the pattern constructed up
24400 to this point into shorter segments using the list of words in Fig.1.
24500 The new pattern formed is termed either "simple", having no
24600 delimiters within it, or "compound", i.e.being made up of two or more
24700 simple patterns. A simple pattern might be:
24800 ( WHAT BE YOU JOB )
24900 whereas a compound pattern would be:
25000 (( WHY BE YOU ) ( IN HOSPITAL ))
25100 Our experience with this method of segmentation shows that compound
25200 patterns from psychiatric dialogues rarely consist of more than three
25300 or four fragments.
25400 After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
25500 to replace the commonly omitted "THAT", such that:
25600 ( I THINK YOU BE AFRAID )
25700 becomes
25800 (( I THINK ) ( YOU BE AFRAID ))
25900
26000 PREPARATION FOR MATCHING
26100
26200 Conjunctions serve only as markers for the segmenter and they
26300 are dropped out after segmentation.
26400 Negations are handled by extracting the "NOT" from the
26500 pattern and assigning a value to a global variable which indicates to
26600 the algorithm that the expression is negative in form. When a pattern
26700 is finally matched, this variable is consulted. Some patterns have a
26800 pointer to a pattern of opposite meaning if a "NOT" could reverse
26900 their meanings. If this pointer is present and a "NOT" is found,
27000 then the pattern matched is replaced by its opposite, e.g. ( I not
27100 trust you ) is replaced by the pattern ( I mistrust you ). We have
27200 not yet observed the troublesome case of "he gave me not one but two
27300 messages". (There is no need to scratch where it doesn't itch).
27400
27500 MATCHING AND RECYCLING
27600
27700 The algorithm now attempts to match the segmented patterns
27800 with stored patterns which are currently about 2000 in number.
27900 First a complete and perfect match is sought. When a match is found,
28000 the stored pattern name has a pointer to the name of a response
28100 function which decides what to do further. If a match is not found,
28200 further transformations of the pattern are carried out and a "fuzzy"
28300 match is tried.
28400 For fuzzy matching at this stage, the elements in the pattern
28500 are dropped one at a time and a match attempted each time. This
28600 allows ignoring familiar words in unfamiliar contexts. For example,
28700 "well" is important in "Are you well?" but meaningless in "Well are
28800 you?".
28900 Deleting one element at a time results in, for example,the
29000 pattern:
29100 ( what be you main problem )
29200 becoming successively:
29300 (a) ( be you main problem )
29400 (b) ( what you main problem )
29500 (c) ( what be main problem )
29600 (d) ( what be you problem )
29700 (e) ( what be you main )
29800 Since the stored pattern in this case matches (d), (e) would not be
29900 constructed. We found it unwise to delete more than one element
30000 since our segmentation method usually yields segments containing a
30100 small (1-4) number of words.
30200 Dropping an element at a time provides a probalility
30300 threshold for partial matching which is a function of the length of
30400 the segment. If a segment consists of five elements, four of the five
30500 must be present in a particular order (with the fifth element missing
30600 in any position) for a match to occur. If a segment contains four
30700 elements, three must match - and so forth.
30800 The transformations described above result in a progressive
30900
31000 coarsening of the patterns by deletion. Substitutions are also made
31100 in certain cases. Some patterns contain pronouns which could stand
31200 for a number of different things of importance to PARRY2. The
31300 pattern:
31400 ( DO YOU AVOID THEM )
31500 could refer to the Mafia, or racetracks, or other patients. When
31600 such a pattern is recognized, the pronoun is replaced by its current
31700 anaphoric value as determined by the response functions, and a more
31800 specific pattern such as:
31900 ( DO YOU AVOID MAFIA )
32000 is looked up. In many cases, the meaning of a pattern containing a
32100 pronoun is clear without any substitution. In the pattern:
32200 (( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
32300 the meaning of THEY is clarified by ( IN HOSPITAL ).
32400
32500 COMPOUND-PATTERN MATCH
32600
32700 When more than one simple pattern is detected in the input, a
32800 second matching is attempted. The methods used are similar to the
32900 first matching except they occur at the segment level rather than at
33000 the single element level. Certain patterns, such as ( HELLO ) and ( I
33100 THINK ), are dropped because they are considered meaningless. If a
33200 complete match is not found, then simple patterns are dropped, one at
33300 a time, from the complex pattern. This allows the input,
33400 (( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
33500 to match the stored pattern,
33600 (( HOW DO YOU COME ) ( IN HOSPITAL )).
33700
33800 If no match can be found at this point, the algorithm has
33900 arrived at a default condition and the appropriate response functions
34000 decide what to do. For example, in a default condition, the model
34100 may assume control of the interview, asking the interviewer a
34200 question, continuing with the topic under discussion or introducing a
34300 new topic.
34400
34500 ADVANTAGES AND LIMITATIONS
34600
34700 As mentioned, one of the main advantages of a
34800 characterization strategy is that it can ignore as irrelevant what it
34900 does NOT recognize. There are several million words in English,
35000 each possessing one to one hundred senses. To construct a
35100 machine-usable word dictionary of this magnitude is out of the
35200 question at this time. Recognition of natural language input such
35300 as described above, allows real-time interaction in a dialogue since
35400 it avoids becoming ensnarled in combinatorial disambiguations and
35500 long chains of inferencing which would slow a dialogue algorithm down
35600 to impracticality, if it could even function at all. The price paid
35700 for pattern matching is that sometimes, but rarely, ambiguities slip
35800 through.
35900 A drawback to PARRY1 was that it reacted to the first pattern
36000 it found in the input rather than characterizing the input as fully
36100 as possible and then deciding what to do based on a number of tests.
36200 Another practical difficulty with PARRY1 from a programmer's
36300 viewpoint, was that elements of the patterns were strung out in
36400 various procedures throughout the algorithm. It was often a
36500 considerable chore for the programmer to determine whether a given
36600 pattern was present and precisely where it was. In the
36700 above-described method, the patterns are all collected in one part of
36800 the data-base where they can easily be examined.
36900 Concentrating all the patterns in the data base gives PARRY2
37000 a limited "learning" ability. When an input fails to match any
37100 stored pattern or matches an incorrect one, as judged by a human
37200 operator, a pattern matching the input can be put into the data-base
37300 automatically. If the new pattern has the same meaning as a
37400 previously stored pattern, the human operator must provide the name
37500 of the appropriate response function. If he doesn't remember the
37600 name, he may try to rephrase the input in a form recognizable to
37700 PARRY2 and it will name the response function associated with the
37800 rephrasing. These mechanisms are not "learning" in the commonly used
37900 sense but they do allow a person to transfer his knowledge into
38000 PARRY2's data-base with very little redundant effort.
38100 PERFORMANCE
38200 We have a number of performance measures on PARRY1 along a
38300 number of dimensions including "linguistic non-comprehension". That
38400 is, judges estimated PARRY1's abilities along this dimension on a 0-9
38500 scale. They also rated human patients and a "random" version of
38600 PARRY1 in this manner.( GIVE BAR-GRAPH HERE AND DISCUSS). We have
38700 collected ratings of PARRY2 along this dimension to determine if the
38800 characterization process represents an improvement over PARRY1.
38900 (FRANK AND KEN EXPERIMENT).